kernel interpolation
Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions
Lu, Weihao, Lin, Qian, Xia, Yingcun, Huang, Dongming
Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the largedimensional setting where the sample size and dimension are of comparable order, i.e., n dγ for some γ > 0. We first consider inner-product kernels on the sphere Sd 1 and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions s 0, where smeasures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever sis positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in Rd whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions. Keywords: Spectral algorithms, learning curves, high dimension, benign overfitting. 1 Introduction Nonparametric regression studies the estimation of an unknown function f: Rd R from ni.i.d.
Kernel Interpolation with Sparse Grids
Structured kernel interpolation (SKI) accelerates Gaussian processes (GP) inference by interpolating the kernel covariance function using a dense grid of inducing points, whose corresponding kernel matrix is highly structured and thus amenable to fast linear algebra. Unfortunately, SKI scales poorly in the dimension of the input points, since the dense grid size grows exponentially with the dimension. To mitigate this issue, we propose the use of sparse grids within the SKI framework. These grids enable accurate interpolation, but with a number of points growing more slowly with dimension. We contribute a novel nearly linear time matrix-vector multiplication algorithm for the sparse grid kernel matrix. We also describe how sparse grids can be combined with an efficient interpolation scheme based on simplicial complexes. With these modifications, we demonstrate that SKI can be scaled to higher dimensions while maintaining accuracy, for both synthetic and real datasets.
Sobolev norm inconsistency of kernel interpolation
We study the consistency of minimum-norm interpolation in reproducing kernel Hilbert spaces corresponding to bounded kernels. Our main result give lower bounds for the generalization error of the kernel interpolation measured in a continuous scale of norms that interpolate between $L^2$ and the hypothesis space. These lower bounds imply that kernel interpolation is always inconsistent, when the smoothness index of the norm is larger than a constant that depends only on the embedding index of the hypothesis space and the decay rate of the eigenvalues.
Kernel Interpolation with Sparse Grids
Structured kernel interpolation (SKI) accelerates Gaussian processes (GP) inference by interpolating the kernel covariance function using a dense grid of inducing points, whose corresponding kernel matrix is highly structured and thus amenable to fast linear algebra. Unfortunately, SKI scales poorly in the dimension of the input points, since the dense grid size grows exponentially with the dimension. To mitigate this issue, we propose the use of sparse grids within the SKI framework. These grids enable accurate interpolation, but with a number of points growing more slowly with dimension. We contribute a novel nearly linear time matrix-vector multiplication algorithm for the sparse grid kernel matrix.
Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories
Zhang, Haobo, Lai, Jianfa, Li, Yicheng, Lin, Qian, Liu, Jun S.
A primary advantage of neural networks lies in their feature learning characteristics, which is challenging to theoretically analyze due to the complexity of their training dynamics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability. After reviewing the neural tangent kernel (NTK) theory and recent results in kernel regression, which address the generalization issue of sufficiently wide neural networks, we examine limitations and implications of the fixed kernel theory (as the NTK theory) and review recent theoretical advancements in feature learning. Moving beyond the fixed kernel/feature theory, we consider neural networks as adaptive feature models. Finally, we propose an over-parameterized Gaussian sequence model as a prototype model to study the feature learning characteristics of neural networks.
High-Dimensional Gaussian Process Regression with Soft Kernel Interpolation
We introduce Soft Kernel Interpolation (SoftKI) designed for scalable Gaussian Process (GP) regression on high-dimensional datasets. Inspired by Structured Kernel Interpolation (SKI), which approximates a GP kernel via interpolation from a structured lattice, SoftKI approximates a kernel via softmax interpolation from a smaller number of learned interpolation (i.e, inducing) points. By abandoning the lattice structure used in SKI-based methods, SoftKI separates the cost of forming an approximate GP kernel from the dimensionality of the data, making it well-suited for high-dimensional datasets. We demonstrate the effectiveness of SoftKI across various examples, and demonstrate that its accuracy exceeds that of other scalable GP methods when the data dimensionality is modest (around $10$).